The work value of information 



Oscar C. O. Dahlsten* and Renato Renner 
Institute for Theoretical Physics, ETH Ziirich, 8093 Zurich, Switzerland 

Elisabeth Rieper 

Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore 

Vlatko Vedral 

Clarendon Laboratory, University of Oxford, Parks Road Oxford 0X1 3PU, United Kingdom 
Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore 
Physics dept.. National University of Singapore, 2 Science Drive 3, Singapore 117543, Singapore 

(Dated: 6 May 2009) 

We present quantitative relations between work and information that are valid both for finite sized 
and internally correlated systems as well in the thermodynamical limit. We suggest work extraction 
should be viewed as a game where the amount of work an agent can extract depends on how well it 
can guess the micro-state of the system. In general it depends both on the agent's knowledge and 
risk-tolerance, because the agent can bet on facts that are not certain and thereby risk failure of the 
work extraction. We derive strikingly simple expressions for the extractable work in the extreme 
cases of effectively zero- and arbitrary risk tolerance respectively, thereby enveloping all cases. Our 
derivation makes a connection between heat engines and the smooth entropy approach. The latter 
has recently extended Shannon theory to encompass finite sized and internally correlated bit strings, 
and our analysis points the way to an analogous extension of statistical mechanics. 

PACS numbers: 03.67, 89.70, 60 



Introduction — The relation between work and informa- 
tion has been the cause of great debate since the be- 
ginnings of statistical mechanics. Focal points of this 
debate include Maxwell's demon, Szilard's engine, Lan- 
dauer's erasure and Bennett's reversible measurements 

That there should be such a relation can be seen intu- 
itively by noting that harnessing motion, e.g. wind, for 
ones benefit requires knowing its directionality. In ther- 
modynamical work extraction from the pressure of a gas 
one uses the knowledge that the particles are confined 
and will only push the piston from one known direction. 
The simplest example of such extraction is perhaps Szi- 
lard's seminal engine, described in Figure [TJ 

In the context of Szilard's engine, previous efforts 
to quantify the relation between work and information 
yielded expressions of the type W — {n — S)kT\n2, where 
W is the work out, n the number of particles, S the lack 
of information (entropy) about the positions of individ- 
ual particles and nkT In 2 the amount of work that would 
be gained if there were no uncertainty [3U5|. Feynman 
argued this expression defines entropy S Q. 

In this Letter we revisit this relation in the light of the 
recently developed smooth entropy approach [a,0|- This 
approach has enabled the extension of Shannon theory 
in a simple yet accurate manner so that is also valid for 
finite sized and correlated bit strings, and it is intriguing 
to ask whether something analogous can be achieved for 
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statistical mechanics. 

We suggest as part of our approach that work extrac- 
tion should be treated as a game where an agent uses its 
information to extract work by guessing aspects of the 
microstate of the system. The agent uses information 
compressing unitaries as part of this process. The agent 
also has to choose a trade-off between risk of failure and 
work extracted if successful. The work value of infor- 
mation therefore depends on the risk tolerance too. To 
recover a simple theory, we focus on the extreme cases of 
effectively no- and arbitrary risk tolerance respectively, 
as these cases envelope all others. We derive the two 
respective work values of information, and discuss the 
consequences. We recover the standard result in the ap- 
propriate limit, but show the work value(s) can in general 
be very different from that. The results hold universally 
for quantum systems and classical systems in the same 
way information compression bounds do. 

The presentation proceeds as follows. We firstly sum- 
marize the smooth-entropy approach. We then describe 
existing ideas on how to use information to extract work, 
in particular the idea of using information compression 
in quantum systems. We go on to define the work ex- 
traction game within which to quantify the work value 
of information. We derive the two statements concern- 
ing the work value of information, and then discuss the 
implications. 

Smooth entropies — Given a probability distribution P 
with entries pi, or equivalently a density matrix with 
eigenvalues Ai, there are numerous ways to assign to it 
a number quantifying the associated ignorance, i.e., en- 
tropy. A commonly used function is the Shannon entropy 
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FIG. 1: Work extraction requires information. In a paradig- 
matic example, Szilard noted that if one has information that 
a particle in contact with a thermal reservoir of temperature 
T is in one given side of a box (L or R), one can insert a di- 
vider and trap it there [2]. The divider is attached to a weight 
and, as the particle bounces around due to its thermal energy, 
the divider can be pushed in a predictable direction, lifting 
the weight. The work output is fcTln2 per such particle, and 
this is therefore the work value of one bit (L or R) in this 
context. 



H ~ —'^iPi^og{pi) (we shall also denote it by Hs). 
The reader is less likely to be familiar with the max and 
min entropies which shall both be needed here. They 
are called this because i?min < H < i/max- They are 
defined by H^^^{P) := log|supp(P)| and H^in{P) ■= 
— log(maxipi) respectively, where |supp(P)| is the size 
of the support of the distribution, max^ pi is the peak 
value and the logarithm is to base 2. For the distribu- 
tion Pe^ = [0.5 0.49998 0.00001 0.00001 0] for example, 

-ffmax(P) = l0g(4) = 2 and i/n,in(P) = - log(l/2) = 1. 

An operational meaning of i?max is that it answers the 
question of how many bits (two-level systems) a memory 
would need in order to store a message from the distri- 
bution, -ffmin on the other hand bounds how many out 
of the n bits that are unbiased, in the sense that the 
marginal distribution on them is uniform. The marginal 
distribution on any number of bits will always have an 
entry which is at least of size Pmax '■— max^p^. One can 
accordingly, bearing in mind that the marginal distribu- 
tion is normalised, not find a marginal distribution that 
is uniform and has more than 1/pmax events. Thus no 
more than log(l/p,„ax) = Hmin bits can be uniformly 
distributed. One can moreover say (up to a small term) 
that -ffmin bits are completely unknown, as will later be 
shown in the proof of Theorem II. 

In practical applications one will normally not care 
about extremely unlikely events. This motivated the 
recently suggested [3] modified versions of the two en- 
tropies. The modified versions are called the smooth min 
and max entropy respectively, since they typically do not 
vary much under small changes in the probability distri- 
butions. They are defined in the following manner: 



iJ^i„(P) := maxi?,„i„(P), (1) 
p 

■■= miniJ„,ax(P). (2) 
p 

The maximum/minimum is_taken over all P such that 
the statistical distance d{P, P) < e (the trace distance in 

the quantum case). The parameter e can be interpreted 

as the maximum probability of events one is prepared to 

ignore in the analysis and is normally taken to be very 

small, but non-zero. 

In line with the definition, with probability p > 1 — e, 
a memory of size H^^^ will be enough to store a string 
from the distribution correctly. For example, with p > 
1 - 0.00002 a memory of size H^°°°^ = log(2) = 1 bit 
would suffice for P^x from before. 

By the asymptotic equipartition theorem, both en- 
tropies converge to the Shannon entropy for n i.i.d. dis- 
tributed particles as n ^ oo and e ^ 0, see e.g. @. This 
is only true for the smooth versions. 

Readers familiar with the smooth entropy literature 
can note that what we call i?^ax ^^^^^ is the smooth 
Renyi-entropy of order and not that of order 1/2, but 
these only differ by at most log(l/e) 
Szilard's engine and Bennett's development — In 
this work we will consider the work value of information 
in a quite general work extraction scenario. To under- 
stand why we chose this scenario it is instructive to recall 
certain specific examples existing in the literature. Ben- 
nett, in particular, considered n Szilard's engines (like in 
Figure [T]) together extracting work from a heat bath . 
The experimenter's knowledge is encoded in the prob- 
ability distribution on particle positions {L, i?}". Each 
box has a work value c — kT In 2 associated with knowing 
L or R perfectly. Thus if all boxes are either fully known 
or completely unknown, W = {n — nu)fcTln2, where n„ 
is the number of completely unknown boxes. Bennett 
notes, crucially, that correlations can be exploited too, 
even if the marginal distributions on the bits are uni- 
formly random. The experimenter can implement a re- 
versible interaction between the boxes to compress the 
total randomness/information (which is constrained by 
the correlations) onto individual bits, so that the others 
can then be used for work extraction. As a simple ex- 
ample, let n = 2, p{LL) = p{RR) = 1/2. Then perform- 
ing the reversible so-called controUed-not interaction Q 
would yield p{LL) = p{RL) = 1/2 so that the second 
box could be used to extract c work [^. 

The considerations, like many other information theo- 
retical arguments, apply also to quantum systems ^T^. In 
fact, the nano and quantum regimes should be the focus 
for implementations of these ideas, due to the probably 
unavoidable presence of friction in macroscopic systems. 
We then replace the n bits with n qubits, and the distri- 
bution on {L, i?}" with a density matrix p representing 
the state of the n qubits [13] ■ The reversible interactions 
compressing the information are then unitaryas discussed 
e.g. in [ll| and in a closely related setting in [l2,[l^. We 
may assume p is diagonal, i.e. a classical distribution, be- 
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FIG. 2; Unitary interaction compressing the information such 
that some bits are made fully known, some are biased and 
some uniformly random. Darker colour indicates higher prob- 
ability density. Each bit represents a box such as in Figure [T] 
The agent can use this process to remove or minimise fluc- 
tuations in the work output since known bits will not yield 
any fluctuations. Note that it is not always obvious whether 
a box should be coupled to the weight or not. If the box is 
only biased but not certain, then a risk-averse agent would 
not use it, but another agent may. 



cause the experimenter would otherwise apply a unitary 
to diagonaUse it to minimise the uncertainty in the basis 
in question. 

Our work extraction scenario — We now construct a 
work extraction game from the above examples, wherein 
an agent tries to extract as much work as possible given 
its information. We say the agent succeeds in extracting 
work if and only if it lifts a weight (or does work against 
an analogous counter-force) to a predetermined level. We 
assume, as is common, that the relative thermal fluctua- 
tions in the piston are negligible because all particles are 
working on the same piston — see jl^ for more discussion. 
The game is defined so that the agent: 

• Is given n bits / qubits of work value c and a distri- 
bution on {i,i?}'V{|L), \R)y\ 

• Presets a unitary to be applied to the particles. 

• Presets which of the boxes to use for work extrac- 
tion after the unitary. 

• Presets the weight to be lifted. 

• Then interferes no more in the extraction. 

The work value of information — The above is a well- 
defined, physically concrete, and quite general scenario 
within which to quantify the work value of information. 
The agent has several choices to make. It is natural to 
assume the experimenter chooses the most information 
compressing unitary. Using uncertain bits and choosing 
a heavier weight both increase the possible yield and the 
probability of failure. To simplify the situation we focus 
on the extreme cases of an agent accepting effectively no 
risk of failure, and another accepting effectively arbitrary 
risk of failure. The following Theorems give the work 



value of information in the two respective cases and hold 
for single realizations. 

Theorem I: Except with p < e, the agent can be cer- 
tain to extract 

W={n-H^^^^)c 
work, and no more. 

Theorem II: Except with p < It, for an agent willing 
to risk failing to extract work, 

T^<(^n-i?:,i„+logQ)c. 

We proceed to outline the proof of the Theorems, omit- 
ting some tedious but straightforward calculations for 
clarity. 

Theorem I follows from the following argument. By 
a standard smooth entropy result no fewer than i?^a.x 
bits can be uncertain (except with p < e) so an agent 
unwilling to use any uncertain bits cannot extract more 
than {n—H^^^)c work. That the agent can in fact extract 
that amount of work with certainty follows from noting 
that the agent can apply the unitary which takes the 
initial distribution to a state [pi, ...pk, 0...0] where k is the 
size of the support of the post-smoothing distribution. 
Then only i/^^x — logfc of the bits are uncertain, and 
the agent can use the remaining ones to extract work. 
That concludes the proof of Theorem I. 

We now proceed to prove Theorem II. We prove, 
crucially, that for the agent to guess all bits it is us- 
ing successfully with p > e, it has to desist from us- 
ing at least ffmin + loge bits. To see this, note that 
Pmax — 2''"~"^pmax, whcrc p„jax the pcuk probabil- 
ity of the marginal distribution on the subset. Since 
Pmax > e, n < n ~ (i/min + log e). Thus at least 
-f^min + loge bits havc to be traced out to get p > e chance 
of guessing the remaining bits correctly, in which case 
W < (n — _ffmin + log (i)) c. It can moreover be shown 
that the agent cannot exceed this by using an even larger 
set of bits, nor by varying the counterweight (under the 
restriction p > e). Finally, to recover the smooth version, 
we go through the same derivation for an e-close distribu- 
tion, yielding Theorem II, which accordingly only holds 
with p < 2e. 

Discussion — Four illustrative examples of distributions 
are discussed in Table 1. 

It is moreover interesting that the standard thermody- 
namical heat engine work extraction scheme is a strat- 
egy amongst those considered here, corresponding to the 
unitary being the identity and all bits being selected. In 
general this is a suboptimal strategy (it could for exam- 
ple not extract any work given example four), although 
in the thermodynamical limit (of the first example above) 
it is, interestingly, optimal. This is because the standard 
thermodynamical work from P{L)n (P{R)n) particles on 
the left (right) of the divider can be shown to be given 
hy W — n{l — Hs)kT\n2. This quantity is identical to 
the extractable work if one also allows the information 
compressing unitary — see the first example in Table 1. 

Whilst this type of work extraction is experimentally 
more challenging than standard thermodynamical work 
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TABLE I: Examples of distributions and their work values. 
We set the work value of a box c = fcTln2. The notation 
[(.)]®" means that the distribution is combined with itself in- 
dependently n times. The first two examples are discussed 
more in Figure [3] The third example shows that the state- 
ments are both needed as they do not in general approximate 
one another. The fourth distribution is an example where 
access to the information compressing unitary leads to an al- 
most maximal amount of work being extractable. 



extraction, it is nevertheless significantly easier than full 
quantum computation. One wiU in general not require 
a universal set of unitaries nor a high accuracy in order 
to demonstrate non-trivial work extraction (or 'resetting' 
which is the inverse of that j5|| ) via information compres- 
sion. It seems likely that at least some of the multitude of 
methods being developed for performing quantum gates 
will be suitable for this purpose, and that they will be 
sufhciently good for this application significantly earlier 
than they can be used for quantum computing J/Tj . Sim- 
ilar experiments have already been performed in the con- 
text of NMR algorithmic cooling [l3l.fl8|. 

We finally stress that two agents with differing knowl- 
edge about the same system and/or differing risk tol- 
erance can extract different amounts of work; the ex- 
tractable work must be seen as subjective in that sense. 

The results of this work will be presented in detail in 
forthcoming publications. 
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Conclusion — We have quantified the relation be- 
tween work and information, employing the smooth- 
entropy approach. We suggested work extraction is a 
guessing game where the amount of work an agent can 
extract depends both on its knowledge and its risk tol- 
erance. We nevertheless recovered a simple relation be- 
tween work and information by noting that all risk tol- 
erances are in between and arbitrary risk, and deriving 
the work value of information for those two cases. The 
results point the way to achieving for statistical mechan- 
ics what smooth entropies accomplished for information 
theory more generally. A natural next step is to apply 
our approach to related work/information scenarios, in- 
cluding NMR algorithmic cooling. 



FIG. 3: Three entropies, -ffmin, Hs and -ffmax are evaluated 
for n uncorrelated bits with p{L) =0.7 (a choice motivated 
by the experiment [lH|). One sees that in the n ^ oo limit 
the entropies coincide. This is because all bits are in this 
limit (after an appropriate unitary) uniformly random or fully 
known. Accordingly the first example in Table 1 has the same 
amount of 'min' and 'max' work. The figure also shows that 
n needs to be large in general for the entropies to coincide, 
which is why the second example in Table 1 shows that the 
type 1 and II work extractable are significantly different for 
n = 1000. For more details on this figure, see [l6l ]. 
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